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ABSTRACT The coronavirus replicase gene en- 
codes one or two papain-like proteases (termed 
PLipro and PL2pro) implicated in the N-terminal 
processing of the replicase polyprotein and thus 
contributing to the formation of the viral replicase 
complex that mediates genome replication. Using 
consensus fold recognition with the 3D-JURY meta- 
predictor followed by model building and refine- 
ment, we developed a structural model for the single 
PLpro present in the severe acute respiratory syn- 
drome coronavirus (SCoV) genome, based on signifi- 
cant structural relationships to the catalytic core 
domain of HAUSP, a ubiquitin-specific protease 
(USP). By combining the SCoV PLpro model with 
comparative sequence analyses we show that all 
currently known coronaviral PLpros can be classi- 
fied into two groups according to their binding site 
architectures. One group includes all PL2pros and 
some of the PLIpros, which are characterized by a 
restricted USP-like binding site. This group is desig- 
nated the R-group. The remaining PLipros from 
some of the coronaviruses form the other group, 
featuring a more open papain-like binding site, and 
is referred to as the O-group. This two-group, bind- 
ing site-based classification is consistent with experi- 
mental data accumulated to date for the specificity 
of PLpro-mediated polyprotein processing and PL- 
pro inhibition. It also provides an independent evalu- 
ation of the similarity-based annotation of PLpro- 
mediated cleavage sites, as well as a basis for 
comparison with previous groupings based on phy- 
logenetic analyses. Proteins 2006;62:760-775. 
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INTRODUCTION 


Coronaviruses are enveloped, single-stranded, positive- 
sense RNA viruses.’ Besides economically important vet- 
erinary pathogens,” they include human coronaviruses 
(HCoVs), which are a cause of respiratory tract diseases, 
including the common cold, and occasional enteric infec- 
tions.*-® The identification of a coronavirus as the infec- 
tious agent of severe acute respiratory syndrome (SARS), a 
life-threatening form of atypical pneumonia, has led to a 
renewed interest in coronaviruses.? Despite successful 
containment of the first SARS epidemic by quarantine 
measures, human SARS coronavirus (SCoV) infections 
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persist? without any specific therapy at hand.% 1 Inter- 


feron treatment is currently regarded most useful,’ 
whereas the broad-spectrum antiviral nucleoside analog 
ribavarin and the HIV protease inhibitor combination 
lopinavir/ritonavir proved ineffective.’?1% 

Upon cell infection, the viral replicase gene is translated 
directly from the viral genome.** Autocatalytic processing 
by two proteases, which are part of the replicase polypro- 
tein, releases**~'® nonstructural proteins (nsps).’° These 
form a membrane-bound RNA replication complex.**161® 
One of the two coronaviral proteases, the 3CLpro, has 
already generated much interest as a target.’ It resides in 
nsp5, and, after autocleavage, releases the downstream 
replicase subunits.‘* The processing of the amino-proxi- 
mal nsps is carried out by one or two paralogous protease 
domains within nsp3, the largest of the nsps.’®'9-?° They 
are defined by homology to the papain-like fold’® and 
constitute the peptidase family C16. Mutational analy- 
ses support the presence of a Cys-His catalytic dyad.1>?*?° 
Most coronaviruses harbor two such papain-like protease 
domains, PLipro and PL2pro, whereas SCoV and the 
avian infectious bronchitis coronavirus (IBV) utilize only 
one, which is equivalent to PL2pro.?” PL2pro may cleave 
down- and upstream of nsp3,2"?? but only upstream 
cleavages were associated with PL1pro.'°1%?1?4 Addi- 
tional nsp3 domains include the X domain, which is 
predicted to constitute a RNA processing enzyme,”’ and 
the hydrophobic Y domain, which likely anchors nsp3 to 
membranes.”"?* The PLpro cleavage products nsp1-3 all 
colocalize with the replication complex.4*1©17-8 

The synthesis of both negative- and plus-strand virus 
RNA require ongoing viral protein production,?°~*" and 
complete processing of the replicase N-terminal nsps ap- 
pears to be essential for optimal virus growth.?? The 
development of selective PLpro inhibitors?” may, there- 
fore, provide a new class of antivirals. However, little is 
known about the molecular basis of PLpro cleavage site 
sequence recognition, nor the significance for the existence 
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of two PLpro domains, which may or may not exhibit 
overlapping target site selectivities.?""** The cleavage site 
sequence specificity is limited to a preference for small 
residues (Gly, Ala) in the P, and P, positions for most but 
not all coronaviral PLpros.2°-2*?°27:32-34-87 For a struc- 
tural analysis of PLpros, Herold and colleagues®® built a 
homology model for the PLipro domain of HCoV-229E 
based on the papain structure. The authors modeled an 
additional ~50-residue sequence, which connects the ami- 
no- and carboxy-terminal subdomains of the putative 
papain fold, as a Zn-ribbon. Indeed, the recombinant 
PLipro domain binds equimolar amounts of zinc, and 
mutation of the predicted zinc-binding motif abolished 
catalytic activity.?° Recently, we have identified a struc- 
tural relationship*® between the SCoV PLpro and the 
catalytic core domain of the papain-like herpesvirus- 
associated ubiquitin-specific protease (HAUSP), also known 
as USP7, of the C19 family of ubiquitin-specific proteases 
(USPs).?° Instead of a classical Zn-ribbon, as proposed for 
the PLpros by Herold and coworkers,®” HAUSP contains a 
circularly permuted Zn-ribbon-like domain inserted be- 
tween the two subdomains of the papain fold.*1 We further 
recognized*® that the binding site complementarity of 
HAUSP to the C-terminal ubiquitin sequence LRGG 
matches the narrow specificity profile (LXGG) of SCoV 
PLpro.22-25:27 

In this study, we survey in detail the substrate interac- 
tions predicted in the binding site of SCoV PLpro, particu- 
larly in the S, and S, subsites. The structural framework 
provided by the modeled SCoV PLpro-binding site is then 
combined with comparative sequence analyses in order to 
understand specificity data available for other coronaviral 
PLpros. Despite what their names seem to imply, PL1pro 
and PL2pro do not represent distinctive subgroups of the 
coronaviral papain-like enzymes. Indeed, it has not been 
possible so far to cluster the PL1pro and PL2pro domains 
into specific groups based on clear functional comparisons. 
Our analysis reveals a novel classification of all currently 
known coronaviral PLpros, which is based on their binding- 
site characteristics. This classification is further used for 
an independent evaluation of the current annotation of 
coronaviral PLpro cleavage sites from public databases. 


MATERIALS AND METHODS 
Coronavirus Nomenclature and Sequence 
Accession Numbers 


Coronavirus abbreviations, together with SwissProt (SW; 
http://www.expasy.org/sprot) or GenBank (GB; http:// 
www.ncbi.nlm.nih.gov/entrez) accession numbers used in 
this article are as follows: SCoV for SARS coronavirus 
(strain Tor2; SW: P59641; GB: NC_004718), HCoV for 
human coronavirus strains 229E (SW: Q05002; GB: 
NC_002645), NL (GB: AY518894), OC43 (GB: NC_005147) 
and HKU1 (GB: AY597011), BCoV for bovine coronavirus 
(strain Ent; SW: Q91A29; GB: NC_003045), MHV for 
murine hepatitis virus (strain A59; SW: P16342; GB: 
NC_001846), TGEV for transmissible gastroenteritis virus 
(SW: Q9TW06; GB: NC_002306), PEDV for porcine epi- 
demic diarrhea virus (SW: Q91AV2; GB: NC_003436), and 
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IBV for infectious bronchitis virus (strain Beaudette; SW: 
P27920; GB: NC_001451). Other strains for SCoV, BCoV, 
MHYV, and IBV were omitted from the analysis in order to 
decrease redundancy in the datasets of sequences for the 
PLpros and their respective predicted cleavage sites. 


Structural Bioinformatics 


Fold detection was carried out at the Structure Predic- 
tion Meta Server (http://bioinfo.pl/meta).*” Consensus se- 
quence-to-structure scoring was achieved with the 3D- 
JURY method running in the best-model-scoring mode 
over the default set of eight threading servers, as well as 
over all prediction servers available including other meta- 
predictors.**** The reported top-ranked query-to-tem- 
plate sequence alignments were further refined manually 
by considering (1) the structure-based sequence alignment 
of the identified templates, (2) the sequence alignment of 
the coronaviral PLpro family generated with the CLUSTAL 
W program,” and (3) the secondary structure alignment. 
Secondary structure prediction was obtained with three 
methods: PROFsec,*® PSI-PRED,*’and SAM-T99,*° and 
then applying a consensus by majority voting.*? The final 
sequence-to-structure alignment of SCoV and other corona- 
viral PLpros to the identified template structures is given 
in Figure 1, including predicted and experimental second- 
ary structure elements. This alignment formed the basis 
for the 3D homology modeling of the SCoV PLpro struc- 
ture. 


Building and Refinement of the SCoV PLpro Model 


We have previously reported a short outline for the 
construction and refinement of the SCoV PLpro homology 
model (residues K1632—E1847).*° In brief, the SCoV PL- 
pro model, comprising the Zn-ribbon domain inserted in 
the middle of the protease domain, was built as a chimera 
of the two template structures, HAUSP and foot-and- 
mouth disease virus leader protease (FMDV Lpro), identi- 
fied by the 3D-JURY fold recognition. Detailed procedures 
and atomic coordinates of the final model complexed with 
full-length ubiquitin aldehyde (Ubal) are given in this 
report. 

Structural manipulations were performed with the 
SYBYL 6.6 molecular modeling software (Tripos, Inc., St. 
Louis, MO). First, the homology modeling program COM- 
POSER” in SYBYL was employed in order to fit various 
regions of the SCoV PLpro sequence onto the 2.3 A- 
resolution crystal structure of the core catalytic domain of 
HAUSP complexed with ubiquitin aldehyde (PDB code 
1NBF),®! and onto the 1.9 A-resolution crystal structure of 
FMDV Lpro (1QMY),”” following the sequence alignment 
shown in Figure 1. Based on sequence similarities, dele- 
tions and/or insertions, and the disposition of secondary 
structure elements, the following sequence-to-template 
assignment was adopted: (a) the SCoV PLpro segments 
K1632-E1701, F1798—E1803, and T1814—P1839 were 
taken from FMDV Lpro segments E30—-E96, F137-F142, 
and V150—D176, respectively, largely covering the N- and 
C-terminal subdomains; (b) the SCoV PLpro segments 
L1702-T1797, Y1804—-Y1813, and V1840—-E1847 were 
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Fig. 1. Muttipte s sequence alignment of coronaviral PLpros to a structure-based sequence alignment of 
HAUSP, papain, and FMDV Lpro. Predicted secondary structure elements for SCoV PLpro are shown in gray, 
and the actual secondary structure elements for HAUSP and FMDV Lpro (PDB codes given in the 
parentheses) are shown in black above the alignment. B-strands are represented by arrows, a-helices by 
cylinders, and coils by lines. Selected secondary structure elements referred to in the text are labeled. Active 
site catalytic triad residues are shown on red background. Putative Zn-chelating Cys residues in the Zn-ribbon 
domain are highlighted on yellow background, as are the two reminiscent Zn-chelating residues in HAUSP. 
Boundaries of the Zn-ribbon domain are indicated by vertical red arrows. The position of the putative 
oxyanion-stabilizing residue is indicated with a red dot, and those predicted to engage in interactions at 
substrate positions P,-through-P, (see Fig. 3) are indicated with blue dots, except for P1788 and T1841 of 
SCoV PLpro, which are indicated with blue circles. Papain insertions in the alignment are shown above its 
sequence, and those labeled 1 through 4 correspond with those indicated in Figure 2 on the papain structure. 
Residues identical in half or more of the coronaviral PLpro sequences are in white on dark gray background 
and those conserved in half or more of the cornaviral PLpro sequences are on light gray background, based on 
the BOXSHADE program (http:/www.ch.embnet.org). The conservation highlighting is carried over to the 
sequences of HAUSP, FMDV Lpro and papain. HCoV refers to HCoV-229E. 
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taken from HAUSP segments Q293-E429, H456—-Y465, 
and A513-R520, respectively. In all, these elements in 
HAUSP form the substrate-binding loop a4-a5 and part of 
the helix a5 in the N-terminal subdomain, the finger 
domain, the substrate-covering loop immediately preced- 
ing the catalytic histidine, and the two 8-strands from the 
C-terminal subdomain adjacent to the finger domain. 
Loops in the SCoV PLpro, corresponding to insertions/ 
deletions or junctions relative to the templates, were 
constructed by searching protein structures from the 
Protein Data Bank (PDB; http://www.rcsb.org) using the 
PROTEIN LOOPS program in SYBYL. They include the 
following sequences: P1636—Q1637, E1664—A1671, and 
R1680—D1682 (in the N-terminal subdomain); A1716— 
E1719 in the region connecting the N-terminal subdomain 
to the finger domain of HAUSP; Y1747—-L1751 (correspond- 
ing to the finger domain of HAUSP); and P1788—A1789, 
G1796—-F1798, and K1819—E1820 (in the C-terminal sub- 
domain). For selecting loop conformations, the search 
output was examined for root-mean-square (rms) devia- 
tions at the anchor positions, sequence homology, as well 
as suitability for the overall tertiary structure. 

Using the superimposed HAUSP template structure 
with bound Ubal, the C-terminal portion of the Ubal 
(RLRG-Glycinal) was docked in the SCoV PLpro-binding 
site as a thiohemiacetal adduct covalently bound to the 
catalytic cysteine (C1651). This ligand also mimics the 
SCoV PLpro cleavage site sequence motif LXGG (positions 
P, to P,).?*?°27 The N- and C-termini of protein and 
ligand were blocked with acetyl and methylamino groups, 
respectively. Several SCoV PLpro side chains were manu- 
ally repacked to improve van der Waals contacts and 
hydrogen bonding. Hydrogen atoms were added explicitly, 
and the polar hydrogens were oriented to favor hydrogen 
bonding. The ionization state at physiological pH was 
adopted. The catalytic histidine was treated as neutral due 
to the covalent adduct formation at the catalytic cysteine. 
Accordingly, a hydroxyl group was considered instead of 
an oxyanion in the thiohemiacetal group. Given the impor- 
tance of the putative Zn-chelating cysteines for the trans- 
cleavage activity of HCoV,®°® we also carried out initial 
docking and coordination of a Zn ion to SCoV PLpro based 
on structural superimpositions with two representative 
C4-type Zn-ribbons from the transcription elongation fac- 
tor SII (PDB code 1TFI) and RNA polymerase II subunit 9 
(1QYP), and with the circularly permuted C4-type Zn- 
ribbon of the silent information regulator 2, Sir2 (1ICI). 

Model refinement was carried out by gradual structural 
relaxation using a stepwise energy minimization protocol 
and employing the AMBER all-atom molecular mechanics 
force field.®* (More details on the energy refinement proce- 
dure and the docking of ubiquitin to SCoV PLPro can be 
found in the Supplementary Material.) In terms of the 
basic stereochemical quality of the refined model, 95% of 
the nonglycine residues of SCoV PLpro reside in the most 
favored (75%) and allowed (20%) regions of the Ramachan- 
dran plot, and only one non-glycine residue (E1820) is 
found in the disallowed region. The refined structure 
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preserves the number and general disposition of predicted 
secondary structure elements. 


RESULTS AND DISCUSSION 
Structural Relationships of SCoV PLpro 


We have recently mined the PDB content for further 
structure-to-function annotation of the coronaviral PL- 
pros.*° The structure of the catalytic core domain of 
HAUSP”? was scored by 3D-JURY well above the signifi- 
cance threshold of 50, which is considered to result in a 
prediction accuracy of above 90%.** Simple application of 
standard homology tools (e.g., PSI-BLAST) failed to detect 
any statistically significant relationship between SCoV 
PLpro and any of the known protein structures. The 
structure of FMDV Lpro,®”** was ranked second by 3D- 
JURY, albeit with a borderline significant score. The 
structure of HAUSP and FMDV Lpro each feature a 
papain-like domain, with an additional circularly per- 
muted Zn-ribbon domain inserted between the two subdo- 
mains of the papain fold only in HAUSP. 3D-JURY scored 
only the FMDV Lpro structure above the significance 
threshold when the protease domain of SCoV PLpro alone 
was queried (i.e., after excision of the sequence $1720- 
S1779). 

As already mentioned, this additional inserted domain 
was previously proposed to adopt a Zn-ribbon fold.*? Our 
sequence alignment (Fig. 1) only detects a cysteine residue 
in the first of the four putative Zn-chelating positions in 
the HAUSP sequence. However, when we extended this 
comparison to related USPs, it became clear that all four 
positions are occupied by cysteine residues in ~68% of the 
251 members of the C19 family as aligned in the MEROPS 
database.?© We recognized that in the context of the 
HAUSP finger domain structure, these residues would 
form the Zn-binding motif of a circularly permuted Zn- 
ribbon.*° Independently, a structural relationship be- 
tween the finger domain of HAUSP and the circularly 
permuted C4-type Zn-ribbon has recently been recognized 
by Krishna and Grishin.** 

Although no statistically significant scores were re- 
ported by 3D-JURY for the putative Zn-ribbon domain of 
SCoV PLpro alone (sequence S1720-S1779), all the top- 
ranked structures represented rubredoxins (e.g., PDB 
codes 1824, 1SMM, 1BQ8), which, as HAUSP, feature a 
circularly permuted Zn-ribbon domain. Together with 
Zn-B-ribbons, they belong to the rubredoxin-like fold fam- 
ily according to the SCOP database.®® Members of this 
family contain two C(X), 2C motifs that typically coordi- 
nate Fe?*/Fe?* in rubredoxins and Zn?* in Zn-ribbons. A 
more in-depth discussion of the circularly permuted Zn- 
ribbon is given in Appendix A. 


Overall modeled structure of SCoV PLpro 


A 3D model of SCoV PLpro (K1632—E1847) was con- 
structed as a chimera between the HAUSP and FMDV 
Lpro template structures. Figure 2 compares the refined 
model of SCoV PLpro with the crystal structures of 
HAUSP, FMDV Lpro and papain. Relative to the SCoV 
PLpro protease domain (i.e., excluding the Zn-ribbon do- 
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Fig. 2. Comparison of the modeled structure of the catalytic core domain of SCoV PLpro with the crystal 
structures of the catalytic core domain of HAUSP (PDB code 1NBF), FMDV Lpro (1QYM), and papain (1PPN). 
The protease domains are colored in cyan, the insertions in the middle of the protease domain (residues 
$1720-S1779 in SCoV PLpro, R325—P399 in HAUSP, T113-E123 in FMDV Lpro, and G79—G109 in papain) 
are in red, and the C-terminal extension of HAUSP (S552-K554) is rendered in white. Catalytic triads are 
shown in ball-and-stick representation. The four cysteine residues that coordinate the Zn ion (magenta sphere) 
in the SCoV PLpro model are also shown. Structural differences in papain relative to the other three enzymes 
(see also Fig. 1) are numbered 1 to 4: (1) the sequence preceding the catalytic cysteine and harboring the 
oxyanion hole residue; (2) the insertion between the N- and C-terminal subdomains of the protease domain that 
folds back onto the N-terminal subdomain rather than extending the B-sheet of the C-terminal subdomain as in 
the other structures; (3) a long loop folded onto the C-terminal subdomain and replacing the shorter, 
substrate-covering loop in the other structures; (4) a Trp-containing eight-residue loop inserted after the 
asparagine of its catalytic triad and shielding it from solvent (while the corresponding aspartate in the other 


structures is solvent exposed). 


main), the larger protease domain of HAUSP has two 
additional a-helices in the N-terminal subdomain and 
three additional B-strands in the C-terminal subdomain, 
together with longer intervening loops. In fact, the smaller 
FMDV Lpro is a more suitable template for most of the 
SCoV PLpro protease domain, because of its similar size 
and an exact match of secondary structure elements. 
However, the residues predicted to shape the substrate- 
binding subsites S, through S, in SCoV PLpro (described 
in more detail in the following section) clearly resemble the 
HAUSP-binding site that accommodates the ubiquitin 
C-terminal sequence LRGG.®* Among the several sizable 
differences, which led to the prediction of a less-elaborated 
structure of SCoV PLpro compared to HAUSP, we noted a 
shorter loop after the first B-strand of the C-terminal 
subdomain in the former protease. The corresponding loop 


in HAUSP (B8-89) becomes ordered as a B-hairpin (80-B0’) 
upon ubiquitin binding, presumably, because of its con- 
tacts with the ubiquitin residues in the P, through P, 
positions.”* The 810-811 hairpin loop of HAUSP, however, 
which also covers the ubiquitin C-terminal residues, ap- 
pears conserved in SCoV PLpro, but is three residues 
shorter in FMDV Lpro (see also Fig. 1). Figures 1 and 2 
further highlight significant differences between the SCoV 
PLpro model and papain structure (see Fig. 2 for details). 
The presence of a Zn-ribbon domain in SCoV PLpro is 
compatible with the existence of a circularly permuted 
Zn-ribbon domain in HAUSP,®® in terms of its size, 
sequence location, and predicted secondary structure. As 
in the HAUSP template, the Zn-ribbon domain of SCoV 
PLpro extends the B-sheet in the C-terminal subdomain of 
the protease domain by a parallel B-strand, which serves 
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to anchor the orientation of the Zn-ribbon domain relative 
to the protease domain. Further interdomain contacts 
established in HAUSP between an additional a-helix (a7) 
in the Zn-ribbon domain and a longer loop 89-810 in the 
protease domain, are absent in our model of SCoV PLpro. 
In FMDV Lpro, the inserted Zn-ribbon domain is reduced 
to just one B-strand that preserves the parallel interaction 
with the B-sheet of the protease domain. Further discus- 
sion on the predicted crossover loop conformation of the 
SCoV PLpro circularly permuted Zn-ribbon domain, and 
its implications for interdomain orientation, is given in 
Appendix A. 


Predicted Substrate-binding Site of SCoV PLpro 


The structure of the catalytic core domain of HAUSP, in 
complex with Ubal,°* is a suitable template for reliable 
modeling of the substrate-binding cleft of SCoV PLpro. In 
order to allow a detailed view of specific enzyme-substrate 
interactions in the nonprimed side of the binding groove, 
structural refinement of SCoV PLpro was carried out in 
the presence of RLRG-Glycinal bound covalently to the 
catalytic cysteine as a thiohemiacetal adduct and interact- 
ing with subsites S, through S,. As we have pointed out 
previously,*° this peptidyl aldehyde not only corresponds 
to the Ubal C-terminal sequence, but also matches the 
general P,-P, specificity motif of SCoV PLpro, LXGG, 
derived from the predicted PLpro-processing sites of the 
polyprotein.?”°-?” The details of the substrate interac- 
tions in subsites S, to S, are shown in Figure 3. 

In the P, substrate position, the Glycinal moiety is 
covalently bound to the catalytic residue C1651, which 
together with H1812 and D1826 forms the putative cata- 
lytic triad in a canonical spatial arrangement. The tetrahe- 
dral hemiacetal oxygen atom is stabilized by three hydro- 
gen bonds, namely, with the indole NH group of the 
oxyanion hole residue W1646, the main chain NH group of 
C1651, and the side chain amide group of N1649. Six of the 
seven main-chain heteroatoms of the substrate P, to P, 
positions are engaged in direct intermolecular hydrogen 
bonds with enzyme residues G1811 (one H-bond to P, 
backbone), G1703 (two H-bonds to P, backbone), Y1804 
(one H-bond to P; backbone), and D1704 and Y1813 (two 
H-bonds to P, backbone). Such an extensive hydrogen- 
bonding network indicates not only high levels of comple- 
mentarity in the recognition of the substrate main chain, 
but also that the substrate can achieve substantial binding 
affinity without additional interactions through its side 
chains. 

Furthermore, the side chains of residues N1649 and 
L1702 restrict the S, pocket to hinder the accommodation 
of large P, side chains. In the S, subsite, the side chains of 
residues Y1813 and Y1804 completely occlude the S2 
pocket and clearly prevent binding of P, side chains larger 
than Ala. As mentioned earlier, these two Tyr side chains 
are also involved in the anchoring of the substrate main 
chain at the P, and P, positions, respectively. In addition, 
Y1813 and Y1804 side chains are conformationally re- 
stricted, particularly, the more buried Y1813 adjacent to 
the catalytic H1812 residue. The available space around 
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the P, main chain is also reduced by the B-hairpin loop 
between Y1804 and Y1813. Closure of the loop on the 
substrate main chain also brings it in contact with the 
L1702 side chain, effectively creating a narrow tunnel into 
which the P,-P, di-glycine can fit snugly [Fig. 3(c)]. From a 
structural viewpoint, the overall importance in determin- 
ing the strict P, specificity appears to be Y1813 > Y1804 > 
Y1804—-Y1813 loop. The model clearly explains the ob- 
served S, and S, specificities of SCoV PLpro for glycine 
residues.?”-?° 

The Arg side chain modeled at the P; substrate position 
is largely solvent-accessible, which is in agreement with 
the consensus processing site sequence for SCoV PLpro 
containing a variable P, residue.?**° The only specific 
interaction of the P, Arg side chain is a long hydrogen bond 
(not shown) between its guanidinium group and the sub- 
strate-covering loop Y1804—Y1813 of the enzyme. Leu is 
conserved at the P, position of the three polyprotein- 
processing sites by SCoV PLpro. The modeled P, Leu side 
chain binds in a relatively defined pocket of the enzyme, 
where it contacts the side chains of residues Y1804, as well 
as P1788 and T1841. Low levels of target-template se- 
quence conservation (see Fig. 1) decrease the prediction 
reliability for the contacts with the latter two side chains. 
The P; Arg side chain was readily modeled in a salt-bridge 
interaction with the E1707 carboxylate group (not shown). 
Because of its surface exposure, it is not expected that this 
electrostatic interaction would play a major role in sub- 
strate affinity and specificity. Accordingly, different P, 
residues are found in the putative SCoV PLpro cleavage 
site sequences. 

The HAUSP-like topology of the SCoV PLpro-binding 
site differs significantly from that of papain. In papain, 
SCoV PLpro residues D1704, Y1804, and Y1813 are 
replaced with residues Y67, V133, and A160, respec- 
tively. This precludes hydrogen-bond formation between 
papain and the substrate main chain in the P; and P, 
positions, as outlined above for SCoV PLpro. Impor- 
tantly, substitutions of the S,-occluding residues Y1804 
and Y1813 of SCoV PLpro result in a well-shaped 
substrate-accessible S, pocket in papain, suitable for the 
accommodation of bulky hydrophobic P., side chains, 
such as Leu or Phe.®’ Instead of SCoV PLpro residues 
N1649 and L1702, which sterically block its S, pocket, 
glycine residues are found at the corresponding posi- 
tions in papain (Gly23, Gly65) and related cathepsins, 
which tolerate a variety of P, side chains in the open S, 
subsite. Mutation of any of these two glycine residues in 
cathepsin B to the corresponding non-glycine residues at 
these positions in papaya proteinase IV, which only 
accepts Gly at the substrate P, position, has been shown 
to restrict the P, specificity of cathepsin B to glycine.°® 
The §-hairpin loop Y1804—-Y1813 of SCoV PLpro is 
replaced in papain by a long insertion (labeled 3 in Figs. 
1 and 2) that folds against the C-terminal subdomain of 
the protease. Also different from SCoV PLpro, papain 
does not have a defined S, subsite, in agreement with its 
broad specificity at the substrate P, position. 
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Fig. 3. Substrate recognition in the subsites S, through S, of SCoV PLpro. (a) Stereo view of the modeled 
interactions between the LRG-Glycinal peptidyl aldehyde and the substrate-binding site of SCoV PLpro. 
Carbon atoms are colored in cyan for the protease and in green for the ligand. Hydrogen bonds are indicated 
with dashed lines. The color scheme applied for rendering the protein chains is as in Figure 2, except for the 
protease domain shown here in white. (b) Schematic representation of the interactions show in (a). Protein 
residues are shown with thin lines, the ligand is shown with thick lines, and hydrogen bonds are shown with 
dashed lines. (ce) Steric fit of LRG-Glycinal in the substrate-binding site of SCoV PLpro. The protease is 
represented by its molecular surface, and the ligand is shown as a CPK model in the middle panel and with 
sticks in the two side panels. The view in the central panel is similar to the orientation shown in (a). The left and 
right panels are opposite side views as indicated by the red arrows, through the narrow tunnel in the S,-S, 
region. 
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Fig. 4. Binding site-based classification of coronaviral PLpros. Sequence alignment of binding site 
signature motifs of coronaviral PLpros predicted from the SCoV PLpro model (cf. Fig. 3). Key substrate-binding 
residues are highlighted on black background. The USP-like binding site signature residues are in cyan, and 
the papain-like binding site signature residues in yellow. The conserved catalytic Cys and His residues, as well 
as the variable residue at the putative oxyanion hole position, are in red. In the grouping of coronaviral PLpros, 
R indicates a restricted, USP-like binding site, and O is used to indicate an open, papain-like binding site 
according to the nature of, primarily, the S, subsite. As reference, the corresponding blocks of representative 
USPs are aligned bellow the R-group coronaviral PLpros and of representative papain-like proteases and 
FMDV Lpro below the O-group. Insertions in the papain-like enzymes are indicated by the length of their 
sequences. Entries shown are from the SwissProt database: USP7 (Human; SW: Q93009); USP5 (Human; 
SW: P45974); USP14 (Human; SW: P54578); USP18 (Mouse; SW: Q9WTV6); DOA4 (Saccharomyces 
cerevisiae; SW: P32571); Papain (Carica papaya; SW: P00784); Cathepsin L (Human; SW: P07711); 
Cathepsin K (Human; SW: P43235); Cathepsin B (Human; SW: P07858); and FMDV Lpro (Foot-and-mouth 
disease virus strain O1; SW: P03305). See Materials and Methods for coronavirus nomenclature and 
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sequence accession numbers. 


Comparative Analysis and Classification of 
Coronaviral PLpros 


The modeled architecture and interactions in the non- 
primed side of the SCoV PLpro substrate-binding cleft, 
combined with the multiple sequence alignment presented 
in Figure 1, provide a structural framework for compara- 
tive analysis and classification of the other currently 
known coronaviral PLpros. The resulting binding site 
signature motifs, which characterize the entire coronavi- 
ral PLpro family, are delineated in Figure 4. SCoV PLpro 
residue numbering will be used in the following compari- 
sons. 

One group of coronaviral PLpros is characterized by a 
HAUSP-like binding site and includes, besides SCoV 
PLpro, the PL2pros from HCoV-229E, HCoV-NL, HCoV- 
O0C43, HCoV-HKU1, BCoV, MHV, TGEV, and PEDV and 
the PLipros from HCoV-229E, HCoV-NL, TGEV, and 
PEDV. In the S, subsite of these enzymes (cf. Fig. 3), 
N1649 is absolutely conserved, and L1702 is a non-Gly 
residue; in the S, subsite, Y1813 is absolutely conserved, 
and Y1804 is conservatively substituted by Phe in some of 
the homologs. The occluded S, and S, subsites of all these 
enzymes are suitable for recognition of P,-P., di-glycine 
and appear to hinder accommodation of P, and P, side 
chains larger than Ala. We expect the binding mode of the 


substrate P,-P, main chain to these coronoviral PLpros to 
be also similar, because of conservation of the hydrogen- 
bonding residues G1811, D1704, and Y1813, and conserva- 
tive substitutions of residues G1703 and Y1804. Owing to 
the restricted nature of the S, and S, subsites, we term 
this group of coronaviral PLpros the R-group. Overall, the 
binding site signature for the R-group of coronaviral PLpro 
appears to be remarkably similar to that characteristic for 
USPs.°%6° 

The coronaviral PLipros from HCoV-0OC43, HCoV- 
HKU1, BCoV, and MHV share a papain-like binding site 
that is clearly distinct from that predicted for SCoV PLpro 
and form a second group. One major difference from the 
R-group of coronaviral PLpros is seen in the putative S, 
subsites of these enzymes. Here, Y1813 and Y1804 are 
replaced by smaller residues, namely, Ser and Cys, respec- 
tively. As in papain and related cathepsins, this opens the 
S, pocket for the recognition of bulkier P. side chains (Fig. 
5). Together with the replacement of D1704 by Tyr (an- 
other papain-like substitution), this also eliminates three 
hydrogen bonds to the substrate P;-P, main chain as 
modeled for SCoV PLpro. Replacement of G1811 and 
G1806, which are both conserved in the R-group coronavi- 
ral PLpros, with larger residues may affect the conforma- 
tion and flexibility of the substrate-covering loop (loop 
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Fig. 5. Opening of the S, subsite due to the replacement of the bulky SCoV PLpro residues Y1804 and 
Y1813 (conserved in the R-group coronaviral PLpros) to the smaller residues Cys and Ser found at these 
positions, respectively, in the O-group coronaviral PLpros. The panel on the left shows the molecular surface of 
SCoV PLpro with the surface patches associated with the side-chain atoms of residues Y1804 and Y1813 
colored in cyan. The panel on the right shows the molecular surface of the SCoV PLpro double mutant 
Y1804C,Y1813S with the surface patches belonging to the mutated side-chains colored in yellow. Two 
substrate P, side chains, Cys and Arg, characteristic of the nsp1-nsp2 and nsp2-nsp3 processing sites 
sequences, respectively, by the O-group enzymes, are shown penetrating the S, molecular surface of the 
R-group SCoV PLpro (left panel), but being accommodated well in the spacious S, pocket of the O-group-like 


double mutant (right panel). 


Y1804—Y1813, SCoV PLpro numbering). Interestingly, 
changes in the size of the S, pocket also impact the relative 
location of other subsites: the S, subsite of R-group 
coronaviral PLpros effectively forms the base of the S, 
subsite in the O-group. For example, residues encompass- 
ing positions T1841 and P1788, which putatively contrib- 
ute to the P, recognition in SCoV PLpro, might actually 
impact P, recognition in MHV PLiIpro. The extent of the 
steric hindrance at the S, subsite in the SCoV PLpro model 
yet differs from papain. Although the papain-characteris- 
tic Gly replaces the bulkier L1702, a non-Gly residue is 
still present at the N1649 position, which may neverthe- 
less suffice in blocking accommodation of large P, side 
chains, as shown by mutation of the corresponding Gly27 
in cathepsin B.*® Owing to the open nature primarily at 
the S, subsite but also at the S, subsite, we termed the 
second group of coronaviral PLpros the O-group. The 
presence of hydrophobic residues at the putative oxyanion 
hole position is another interesting feature of O-group 
coronaviral PLpros, contrasting with the hydrogen-bond- 
capable oxyanion-stabilizing residues found in the R- 
group (Gln, Trp, or Thr), as well as in HAUSP and other 
USPs, FMDV Lpro, papain, and related cathepsins (Asn or 
Gln). 

Although the IBV PLpro-binding site does not fit per- 
fectly into the above bipartite classification, it appears 
more related to the R-group of coronaviral PLpros. At the 
S, subsite, the removal of the N1649 side chain through 
replacement by Gly does not generate a more accessible S, 
pocket because a bulkier Phe, in turn, replaces L1702. 
Similarly, although the S, pocket may become more spa- 
cious because of the replacement of Y1813 with Cys, the 


conservative substitution of Y1804 for Phe is expected to 
still prevent the recognition of large P, side chains. 
Additionally, conservation of the SCoV PLpro residues 
D1704, G1811, and G1806 suggests similarities in the 
binding mode of the substrate main chain between IBV 
PLpro and the R-group of coronaviral PLpros. 


Consistency of Coronaviral PLpro Classification 
with Experimental Data 


After the demonstration that SCoV PLpro cleaves at the 
nsp2-nsp3 boundary by Thiel and colleagues,” Baker and 
coworkers” have recently demonstrated that SCoV PLpro 
mediates cleavages at all three putative SCoV PLpro 
processing sites. These occurred most likely at the highly 
conserved P, to P, motif LXGG, consistent with earlier 
predictions.2” Baker and coworkers have also demon- 
strated different P. specificities for MHV PLipro and 
PL2pro using extensive cleavage site-directed mutagen- 
esis of the polyprotein. For MHV PLi1pro, these studies 
revealed a stringent requirement for Gly in P, and a 
preference for Arg at the P, position, where several 
substitutions, including Gly, precluded PLilpro cleav- 
age.?4—%6 In contrast, the presence of Gly at both P, and P, 
is critical for recognition and processing of the nsp3-nsp4 
cleavage site by MHV PL2pro.?? Liu and colleagues? 3% 
investigated the specificity of IBV PLpro by site-directed 
mutagenesis at the p41 and p87 cleavage sites, which are 
equivalent to the nsp3-nsp4 and nsp2-nsp3 sites, respec- 
tively, of the other coronavirus replicase polyproteins.”* 
These two highly conserved cleavage sites feature Lys, 
Ala, and Gly at Ps, P., and P,, respectively. A Gly is also 
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Fig. 6. Assignment of confirmed/predicted cleavage site sequences in coronavirus replicase polyproteins 
and processing PLpros based on predicted requirements at the S, and S, subsites. nspX-nspY indicates 
cleavage between nonstructural proteins X and Y of the polyprotein. The P, and P, positions of the cleavage 
site sequence are highlighted on black background and are colored in cyan for small residues (Gly, Ala) and 
yellow otherwise. The right column lists the PLpros responsible for the processing event at each site. Enzyme 
names are given on black background if the respective cleavage event is supported by experimental data. The 
annotated predicted PLpro-mediated cleavage sites were retrieved form the SwissProt (SW) and/or Genbank 
(GB) databases, except those for HCoV-NL, HCoV-OC43, and HCoV-HKU1, which were derived by similarity 
in this work, and for the TGEV nsp1-nsp2 and nsp3-nsp4 cleavage sites, reannotated in this work based on the 
predicted binding site architectures. ? The SW and GB annotations for the TGEV nsp1-nsp2 cleavage site are 
as ARTGRG,,,-Al and KIARTG,9,-RG, respectively. ° The SW and GB annotation for the TGEV nsp3-nsp4 
cleavage site is VSPKSGz3g3-SG. “The shown PEDV nsp3-nsp4 cleavage site corresponds to the GB 
annotation; the SW annotation for this site is IANKKG,.,,.-AG. See Materials and Methods for nomenclature 
and sequence accession numbers. 


769 


found in P,’. Mutational data suggest that the presence of 
P, Gly and P, Ala, but not P,’ Gly are essential for 
cleavage. The substrate specificities of HCoV-229E PLipro 
and PL2pro were established by determination of the 
polyprotein processing sites by sequence analysis in the 
laboratories of Siddell and Ziebuhr.*®-?* Importantly, both 
enzymes exhibited overlapping substrate specificities at 
the nsp2-nsp3 cleavage site,”’ and the two experimentally 
confirmed PLpro-processing sites of HCoV-229E feature 
P, Gly and P, Gly/Ala.2»37 

In summary, the confirmed sites processed by R-group 
coronaviral PLpros show a stringent requirement for 
Gly/Ala in P, and P,, which agrees with the restricted 
nature of the S, and S, subsites predicted for this group. 
The O-group MHV-PL1pro processes the polyprotein at 
sites with Gly/Ala at P, and Arg/Cys at P., which 
corresponds to the more open S, subsite in this group. 
Thus, our classification of coronaviral PLpros, which is 
based on the predicted topology of the nonprimed side of 
the substrate-binding site, correlates with specificity 
and activity data available for some of these enzymes 


(see also Fig. 6). It is interesting to note that MHV 
PL1pro (O-group) and MHV PL2pro (R-group), in addi- 
tion to their different substrate specificities, also display 
distinct behaviors toward E-64d, a membrane-perme- 
able derivative of the cysteine protease-specific irrevers- 
ible epoxysuccinyl inhibitor E-64. In virus-infected cells, 
E-64d was shown to block the MHV PL1pro-mediated 
processing of nsp1 and nsp2.?°?+ MHV PL2pro-medi- 
ated nsp2-nsp3 cleavage, however, appeared to be E64d- 
insensitive.®' The molecular basis for E-64d specificity 
can be attributed to a Leu residue that normally binds 
into S, subsite of most cellular PLpros.®”°? The steric 
occlusion of the Sz pocket in MHV PL2pro most likely 
precludes the accommodation of large P, substrate side 
chains or the bulky Leu side chain of the E-64d inhibi- 
tor. In contrast, MHV PLI1pro has an open papain-like 
S, pocket, which can accommodate bulky moieties, such 
as the side chains of Leu (from the E-64d inhibitor), Arg 
(from the nsp1l-nsp2 processing site), or Cys (from the 
nsp2-nsp3 processing site), but would not establish a 
productive contact with a small Gly residue (Fig. 5). 
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Evaluation of Annotated PLpro-mediated 
Processing Sites 


In the absence of experimental specificity data for other 
coronaviral PLpros, Figure 6 summarizes the PLpro- 
mediated processing sites sequences as annotated by the 
SwissProt and GenBank databases, based on similarity to 
confirmed processing sites. We recognize a remarkable 
complementarity between these sites and our binding 
site-based classifications of coronaviral PLpros. Specifi- 
cally, the majority of annotated processing sites for R- 
group PLpros feature small residues (Gly, Ala) in P, and 
P,, which is in agreement with the restricted nature of the 
binding sites of the processing PLpros. For PEDV, how- 
ever, there are different predictions in the SwissProt and 
GenBank databases for the nsp1-nsp2 cleavage sites. Our 
classification is consistent with the GenBank annotation. 
In the case of TGEV, the annotated nsp1-nsp2 and nsp3- 
nsp4 PLpro-mediated cleavage sites contain larger P, 
residues, i.e., Arg (according to SwissProt) or Thr (accord- 
ing to GenBank) in the nsp1-nsp2 cleavage site, and Ser 
for nsp3-nsp4. Given the restricted S, and S, binding 
subsites predicted for both PLlpro and PL2pro of TGEV, 
we revised the nspl-nsp2 cleavage site annotation to 
A111-I112, one residue downstream of the SwissProt 
annotation, thus placing Ala and Gly in the P, and P, 
positions, respectively. The nsp3-nsp4 cleavage site of 
TGEV may also be subject to revision. Processing may 
rather occur between $2389 and G2390, which is also one 
residue downstream to the current annotation. Although 
this reannotation positions Ser instead of Gly in Pj, it 
substitutes Ser for Gly in the more restricted S, subsite 
and displaces Pro from P, to P;. In our model of SCoV 
PLpro, D1704, which is fully conserved in the R-group 
PLpros (Fig. 4), forms a hydrogen bond to the P, main- 
chain NH group (Fig. 3). However, this would be incompat- 
ible with the presence of a P, Pro as predicted by the 
current database annotation. Another alternative cleav- 
age site, in our opinion less favorable, would be between 
G2390 and F2391, two residues downstream to the current 
annotation, which although positions Pro in P, and Gly in 
P,, introduces Ser in the more restricted S, subsite and 
places the bulky hydrophobic Phe in P,’ (unique among 
coronaviral PLpro cleavage sites). 

In the O-group coronaviral PLpros, the nsp1-nsp2 and 
nsp2-nsp3 processing sites confirmed for MHV PL1pro and 
the predicted corresponding sites for the PLipros from 
HCoV-0C43, HCoV-HKU1, and BCoV are highly con- 
served (Fig. 6). As presented earlier, bulky P, residues 
(Arg, Cys) are predicted to fit into the spacious S, pocket of 
these enzymes (Fig. 5). Obviously, in the coronaviral 
genomes that do not contain an O-type PLpro, the P, and 
P, side chains of both the nspl-nsp2 and the nsp2-nsp3 
cleavage sites are reduced in size to fit the R-type PLpro 
binding site. 


On the Evolution of Coronaviral PLpros 


Our results base the classification of coronaviral PLpros 
on structural binding site relationships, superseding previ- 
ous classification attempts. Different from a previously 
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reconstructed phylogenetic tree of coronaviral PLpros,”* 
for example, our classification does not group the PL2pros 
of MHV and BCoV together with their PL1pros, but rather 
with the PLipros and PL2pros from HCoV-229E and 
TGEV. The putative representation of PLpros, whether R- 
or O-type, in the primordial nsp3, and their evolution in 
the contemporary lineages of coronaviruses,** remains 
speculative. It cannot be ruled out that the involvement of 
PLpros in processes other than polyprotein processing has 
played a part in the diversification of their structural 
relationships, and influenced the co-evolution of their 
cleavage sites. Interestingly, the O-type signature along 
with its corresponding cleavage site sequences appears to 
be less diverse (Figs. 4 and 6), maybe owing to a more 
recent evolutionary origin than for the R-type signature. 
Our results, however, suggest that a conversion of PLpro 
specificity in either direction would have been associated 
with major structural active site rearrangements, requir- 
ing considerable evolutionary pressure. 


On the Predicted Deubiquitinating Activity of 
Coronaviral PLpros 


Our prediction of deubiquitinating activity of SCoV 
PLpro*° can safely be extended to the R-group enzymes 
that cleave the polyprotein at sites that contain the motif 
LXGG in P, to P,. These enzymes are the PL2pros from 
HCoV-OC43, HCoV-HKU1, BCoV, and MHV (Fig. 6). In 
order to comment on the ability of other R-group PLpros to 
deubiquitinate proteins, further experimental and theoreti- 
cal studies are needed to elucidate whether those coronavi- 
ral PLpros can accommodate in their binding sites a P, 
Leu and a P; Arg, the residues found in ubiquitin. Owing 
to the requirement for a bulky P, residue®* and the 
predicted spacious S, subsite (Fig. 5), itis unlikely that the 
O-group PLpros will possess deubiquitinating activity. An 
interesting observation is that, for those coronaviruses 
where a P, Leu residue is found at the replicase cleavage 
site by an R-group PLpro, the coronavirus also has an 
O-group PLpro (i.e., HCoV-OC43, HCoV-HKU1, BCoV, 
and MHV). The R-group enzyme performs a single cleav- 
age of the polyprotein at nsp3-nsp4, whereas the O-group 
enzyme cleaves at nspl-nsp2 and nsp2-nsp3. The SARS 
coronavirus is an exception, because it has an R-group 
PLpro that processes at sites containing Leu in the P, 
position, but it lacks an O-group PLpro. In the coronavi- 
ruses where there is no Leu in the P, of the processing site, 
there are two R-group enzymes, or in the case of IBV, only 
one R-group PLpro. 


CONCLUSION 


Owing to the wealth of protein 3D structural data 
coupled with the constant improvement of fold recognition 
algorithms, a significant structural relationship could be 
detected between the catalytic core domains of SCoV 
PLpro and HAUSP cysteine proteases, both featuring a 
circularly permuted Zn-ribbon domain inserted in the 
middle of a papain-like fold. One can thus reconsider the 
current classification of coronaviral PLpros and USPs into 
families C16 and C19, respectively, in the MEROPS 
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peptidase database (http://merops.sanger.ac.uk). 7° Com- 
parative sequence analysis data superimposed onto a 
binding site structural framework show that coronaviral 
PLpros can be classified into two groups according to their 
binding site architectures. One group, termed R and 
present in all currently known coronaviruses, is predicted 
to feature sterically restricted S, and S, substrate-binding 
subsites and a P,-P,-substrate-binding mode characteris- 
tic of USPs. The other group, termed O, particularly 
features an open S, subsite and a substrate-binding mode 
that resembles more papain and related cathepsins. This 
classification, which differs in part from those extracted 
from the phylogenetic trees of coronaviral replicases and 
PLpro domains, is a first step toward the understanding of 
the molecular basis for the processing specificity and 
inhibition selectivity data that has become available for 
several members of the family. For the remaining corona- 
viral PLpros, the R/O classification can be used to critically 
evaluate and, in a few instances, to revise the publicly 
available annotations of polyprotein cleavage sites. The 
ubiquitous presence of the R-group binding site in all 
coronaviruses can be advantageously exploited to design 
PLpro inhibitors with a wide-spectrum efficacy against all 
coronaviruses. Certainly, experimental structure determi- 
nations, at least for one family member, will be valuable 
for a more reliable identification of those structural details 
that may be required to overcome the predicted inhibitory 
cross-reactivity with host enzymes, particularly with the 
USPs. 
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APPENDIX A 
Circularly Permuted Zn-ribbon Domain of 
Coronaviral PLpros 


Arecent report demonstrates that the previously unchar- 
acterized finger domain, inserted between the two subdo- 
mains of the papain fold of the HAUSP catalytic core 
domain, represents a circularly permuted Zn-ribbon.** 
Although in HAUSP this domain has lost its zinc-binding 
ability because of mutation of two of the Zn-chelating 
residues, intact Zn-chelating capability appears to be 
present in a number of USPs (family C19 in the MEROPS 
database, http://merops.sanger.ac.uk) that are close ho- 
mologs of HAUSP (Fig. Al). Our model predicts that the 
Zn-ribbon domain of SCoV PLpro resembles that of HAUSP, 
but essentially differs from the previous model of HCoV- 
229E PLipro, in which the topology of the corresponding 
sequence was based on classical Zn-ribbons.?? Although 
this prediction can be fully validated only by an experimen- 
tal structure, there are several lines of evidence support- 
ing the existence of a circularly permuted instead of a 
classical Zn-ribbon topology for the intermediate domain 
of coronaviral PLpros. 

The first line of evidence is represented by the fold 
recognition result itself, with the detection of HAUSP as 
the only statistically significant structural template for 
SCoV PLpro. Sequence comparisons (Fig. Al) suggest that 
the putative Zn-chelating residues of coronaviral PLpros 
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Sequence comparison of the putative circularly permuted Zn-ribbon domains of SCoV PLpro and 


selected coronaviral PLpros, the finger domain inserts of several USPs, and members of the rubredoxin-like 
fold in the SCOP database,°® including classical and circularly permuted Zn-ribbons. gcp-ZF denotes a 
genuine circularly permuted Zn finger. mcp-ZF denotes manually circularly permuted Zn fingers, with the 
position of circular permutation indicated in the sequence by “( )”. Secondary structure elements are rendered 
as in Figure 1. Metal-chelating residues are highlighted on yellow background. Residues identical in half or 
more of all sequences are in white on dark gray background, and those conserved in half or more of all 
sequences are on light gray background, based on the BOXSHADE program (http://www.ch.embnet.org). 
HCoV refers to HCoV-229E. See Materials and Methods for coronavirus nomenclature and sequence 
accession numbers. The USPs selected along with HAUSP are from the SwissProt database: scUBP15 
(Saccharomyces cerevisiae; SW: P50101); scUBP8 (S. cerevisiae; SW: P50102); scDOA4 (S. cerevisiae; SW: 
P32571); and dFAF (Drosophila melanogaster, SW: P55824). 


align onto the HAUSP residues corresponding precisely to 
the predicted Zn-chelating positions of related USPs.** 
Second, the predicted secondary structure elements of 
coronaviral PLpros correspond to those determined for the 
circularly permuted Zn-ribbon-like domain (previously 
termed finger domain) of HAUSP.®* Notably, these two 
criteria also apply to a comparison with the sequence of the 
circularly permuted Zn-ribbon fold from the structure of 
the silent information regulator 2 (Sir2) homolog,® the 
only other known representative of this fold (PDB code 
1ICI; Fig. Al). In contrast, coronaviral PLpros can be 
readily aligned (i.e., preserving the spacing between second- 
ary structure elements after alignment of Zn-chelating 
residues) onto representatives of the classical C4-type 
Zn-ribbon fold only after assuming the appropriate circu- 
lar permutation of the latter (Fig. A1). 

Further fold recognition data provide a third line of 
evidence for a circularly permuted Zn-ribbon domain in 
coronaviral PLpros. By querying the sequence of SCoV 
PLpro intermediate domain to fold recognition servers, 
rubredoxins were top-ranked as structural relatives by 
consensus scoring, albeit below the significance threshold 
of the 3D-JURY method. The iron- instead of zinc-binding 
rubredoxins display the same overall fold topology as 
circularly permuted Zn-ribbons according to the SCOP 
database (http://scop.berkeley.edu).” As for USPs and 
naturally or manually circularly permuted Zn-ribbons, the 
alignment of rubredoxins onto coronaviral PLpros se- 
quences agrees with the assignment of metal-chelating 
residues and secondary structure conservation (Fig. Al). 
Fold recognition, however, failed to signal the genuine 
circularly permuted Zn-ribbons in the structures of HAUSP 
and Sir2 homolog, which is not surprising because in these 


cases the Zn-ribbon domains are part of much larger 
protein structures. These failures rather reflect an exist- 
ing shortcoming of present fold recognition methods to 
correctly detect suitable template domains embedded in 
large multidomain structures. It does not necessarily 
imply that these two genuine circularly permuted Zn- 
ribbons are more distant structural homologs of coronavi- 
ral PLpros than rubredoxins. Importantly, classical Zn- 
ribbons were not detected even though they are represented 
in the PDB as single-domain structures. 

The fourth line of evidence is given by the 3D structural 
comparison of classical versus circularly permuted Zn- 
ribbons (including rubredoxins, see Fig. A2). A classical 
Zn-ribbon fold has its chain termini forming the outer 
8-strands of the B-sheet. In contrast, a circularly permuted 
Zn-ribbon fold has its chain termini forming the inner 
strands of the B-sheet. In both cases, the inner strands are 
generally longer than the outer ones. The difference be- 
tween the two folds is particularly striking at the N- 
terminus. In the classical Zn-ribbon fold, the N-terminus 
forms a very short outer B-strand, which is even absent in 
some of the fold representatives. In the circularly per- 
muted Zn-ribbon fold family, including rubredoxins, on the 
other hand, the N-terminus forms a long inner £-strand. 
Our prediction that the Zn-ribbon domain of SCoV PLpro 
contains long B-strands at the sequence termini is compat- 
ible with a circularly permuted fold. 

As mentioned earlier, there is good agreement between 
the secondary structures predicted for the intermediate 
domain in coronaviral PLpros and the one observed for the 
Zn-ribbon domain of HAUSP. The latter has two addi- 
tional structural features outside the B-ribbon: a B-strand 
and an a-helix in the crossover segment connecting the 
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Fig. A2. Structural comparison between classical C4-type Zn-ribbons, rubredoxins, and circularly per- 
muted Zn-ribbons. The two structures shown in the upper row are representatives of a large family of the 
classical C4-type Zn-ribbon fold, namely, from the RNA polymerase II subunit 9 (left, PDB code 1QYP: 
residues 1-57) and from the transcription elongation factor SII (right, 1TFl: 1-50). The two structures shown in 
the middle row are typical examples of rubredoxins, from Pseudomonas oleovorans (left, 1524: 1-56), and 
from Pyrococcus furiosus (right, 1BQ8: 1-54). The structures shown in the bottom row are the currently known 
members of the circularly permuted C4-type Zn-ribbon fold, from the Sir2 homolog (left, 11Cl: 116-159) and 
from HAUSP (middle, 1NBF: 325-399) together with the Zn-ribbon in the modeled structure of SCoV PLpro 
(right, residues 1720-1779). Ribbons are colored using a rainbow color ramp starting from blue at the 
N-terminus and ending in red at the C-terminus of each domain. The Zn?* ions bound to classical and circularly 
permuted Zn-ribbons are shown as magenta spheres. A Cd?* ion and a Fe** ion bound to the exemplified 
rubredoxins are shown as purple and green spheres, respectively. The metal-chelating cysteine residues are 
also displayed, except for the circularly permuted Zn-ribbon of HAUSP that lost its Zn?*-binding capability and 
retains two of the four metal-coordinating residues (Cys and His, also displayed). Red arrows indicate 
additional secondary structure elements present outside the @-ribbon in the circularly permuted Zn-ribbon 
domains of HAUSP and SCoV PLpro. 


outer B-strands of the circularly permuted £-ribbon (Fig. 
A2). Both these structural features are utilized in HAUSP 
to anchor the Zn-ribbon domain to the protease domain. 
The isolated B-strand is preserved in the modeled circu- 
larly permuted Zn-ribbon of SCoV PLpro. Similar to the 
HAUSP structure, it establishes a parallel B-strand inter- 
action to the protease domain (Fig. 2). Contrary to HAUSP, 
the a-helix insertion is, however, predicted to be absent 
from the circularly permuted Zn-ribbon of SCoV PLpro, as 
is its interacting loop from the protease domain. This 
suggests that the relative orientation of the protease and 


Zn-ribbon domains in the coronaviral enzyme is less rigid 
than in HAUSP. This may have implications for ubiquitin 
binding and enzyme regulation. 

Although Zn?*, not Fe?*, has been established as an 
essential cofactor of HCoV-229E PLipro,®® it has to be 
considered that rubredoxins may also represent viable 
templates for the modeling of the putative Zn-ribbon 
domain of coronaviral PLpros. In fact, sequence similari- 
ties of the intermediate domain of coronaviral PLpros to 
some rubredoxins appear to be even stronger than to 
HAUSP. In this regard, it is also interesting to note that 
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Herold and coworkers determined an increased amount of 
Fe?* instead of Zn?* bound to the protein when during 
recombinant expression supplementary zinc acetate was 
omitted from the bacterial growth medium.?® However, 
secondary structure similarities are predicted to be more 
pronounced when SCoV PLpro is compared with HAUSP 
(Fig. Al). The principal difference between the rubredoxin 
and HAUSP Zn-ribbon structures rests in the crossover 
segment outside of the B-ribbon (Fig. A2), which in the 
HAUSP structure mediates the attachment to the pro- 
tease domain and participates in direct interactions with 
Ubal.*? In rubredoxins on the other hand, the crossover 
segment is a loop that folds against the B-sheet and does 
not reach to the opposite edge of the B-sheet. Because 
several of the conserved residues that stabilize the cross- 
over segment in rubredoxins are different in SCoV PLpro, 
the crossover loop of coronaviral PLpros may have a 
decreased propensity to fold against the B-sheet of the 
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Zn-ribbon domain, and may, therefore, become available 
for direct interaction with the protease domain, as mod- 
eled in this study for SCoV PLpro (Fig. 2). Such a loop 
conformation would further be expected to affect interdo- 
main flexibility and ligand binding as seen in the HAUSP- 
ubiquitin complex.°' Curiously, many of the HAUSP- 
related C19-family USPs such as UBP4, UBP15, and 
UBC11 from higher eukaryotes, feature an uncharacter- 
ized sequence insertion of ~290 residues between the 
8-strand and the a-helix within the crossover loop of the 
Zn-ribbon domain, as judged by a sequence family align- 
ment that can be accessed through the MEROPS database 
(http://merops.sanger.ac.uk). 


Supplementary Material 


Supplementary Materials Details on the energy refine- 
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